Skip to content

feat(preprod): Add snapshots subcommand#3110

Open
lcian wants to merge 17 commits intomasterfrom
lcian/feat/snapshots
Open

feat(preprod): Add snapshots subcommand#3110
lcian wants to merge 17 commits intomasterfrom
lcian/feat/snapshots

Conversation

@lcian
Copy link
Member

@lcian lcian commented Jan 28, 2026

Updated version of #3049 to discuss and iterate on things.

Notable changes:

  • Removed shard_index parameter from the command. I'm not sure what the purpose of that was originally.
  • This uses the new many (batch) API from objectstore_client. All uploads are executed as batch requests, reducing network overhead. Unfortunately, with they way things are implemented now, we will still have to buffer all files in memory before sending the request, as we need to hash their contents to determine the filename. If we could just use the filename as the key in objectstore, it would be much better because that way we could stream the files over.

Note that auth enforcement still needs to be enabled for objectstore, so that's currently blocking this to be used for anything but internal testing.

Ref FS-233

@github-actions

This comment was marked as outdated.

#[derive(Deserialize, Debug)]
pub struct OrganizationDetails {
pub id: String,
pub links: OrganizationLinks,
Copy link
Member Author

@lcian lcian Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint has a bunch more information but what we really care about is the id and links.
The region_url contains the regional API url, e.g. us.sentry.io. This way we can hit that endpoint directly instead of going through control silo (sentry.io).

If I understand correctly I think this information might already be somehow embedded in org tokens, but not in personal tokens. Or maybe I'm confusing this with something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly I think this information might already be somehow embedded in org tokens, but not in personal tokens

This is 100% correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I should see how to resolve the id from the token instead of using the API call when we have an org token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, only the region URL is embedded. Org ID is not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss about the need to resolve the org and project IDs in the first place. Almost all other endpoints are capable of accepting both slugs and IDs interchangeably in the backend (Sentry side). To me it seems like a pretty serious limitation if objectstore cannot accept either.

That said, what you mentioned about having arbitrary keys in your other comment makes sense, and I guess it makes it harder to accept both IDs and slugs. Let's see if we can come up with a solution here, though

use anyhow::{Context, Result};

/// Given an org and project slugs or IDs, returns the IDs of both.
pub fn get_org_project_id(api: impl AsRef<Api>, org: &str, project: &str) -> Result<(u64, u64)> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, for all commands a user can provide --project and --org as slugs or IDs.
These utils are needed to get the corresponding IDs, so that objects from the same org/proj all have the same paths in objectstore regardless if the user passed in slugs or IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be weird to take in Api and it might be possible that these functions should live somewhere else, IDK.
They certainly don't belong to the Api struct as its methods seem to all map 1to1 to Sentry API calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the backend interpret the slugs in to ids instead of the frontend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the backend interpret the slugs in to ids instead of the frontend?

Indeed, I agree that this would be preferable. The current code adds two additional API calls that could be avoided if the backend resolves the slugs/IDs appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what backend and frontend here refers to.
How objectstore works is that the scope (org and project in this case) is an arbitrary sequence of key-value pairs. We recommend using org and project ID, or at least org ID, but that's not mandatory.
Therefore it's not a responsibility of objectstore or its endpoint in the monolith to normalize org and project, we simply take what the user provides. It's responsibility of the user to normalize.

lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indirectly adds a dependency to reqwest and we need to double check what the implications of that are, especially in regard to rustls vs native-tls which could be problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also adds a bunch of deps but I think most of them are inevitable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the TLS stuff next week, because I am uncertain what problems you're concerned about.

@linear
Copy link

linear bot commented Jan 28, 2026

let client = Client::new(url)?;

let (org, project) = get_org_project_id(Api::current(), org, project)?;
let session = Usecase::new("preprod")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set an expiration policy here.
I think based on our discussions TTL should be ideal for this usecase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed we want different TTL based on customer/plan. We could introduce a new endpoint in sentry that tells you this info and set the TTL here based on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for now I think we're good with nothing set just for testing, but I'll track this here for my team to follow up on in the near future: https://linear.app/getsentry/issue/EME-839/endpoint-for-determining-ttl-for-an-org

@lcian lcian marked this pull request as ready for review February 5, 2026 18:32
@lcian lcian requested review from a team and szokeasaurusrex as code owners February 5, 2026 18:32
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

// For other files, use {org}/{project}/{hash}
let hash = compute_sha256_hash(&contents);
format!("{org}/{project}/{hash}")
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON file keys missing project ID causes cross-project collisions

High Severity

JSON file keys use the format {org}/{snapshot_id}/{filename} which excludes the project ID, while non-JSON files use {org}/{project}/{hash} which includes it. Since the command requires --project and is documented as uploading "build snapshots to a project", omitting the project ID from JSON file keys causes different projects within the same organization to overwrite each other's files when using the same snapshot_id and filename.

Fix in Cursor Fix in Web

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll handle this and other business logic related to the emerge snapshot features

files.push(entry_path.to_path_buf());
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hidden directories not skipped, exposing sensitive files

Medium Severity

The WalkDir traversal descends into hidden directories (e.g., .git, .svn, .env.d/) but is_hidden_file only checks if the file's own name starts with .. Files inside hidden directories with non-hidden names (like .git/config, .git/objects/*) are uploaded despite the intent to skip hidden content. This could inadvertently upload sensitive data or significantly bloat uploads with unwanted files.

Additional Locations (1)

Fix in Cursor Fix in Web

.file_name()
.and_then(|name| name.to_str())
.unwrap_or("unknown.json");
format!("{org}/{snapshot_id}/{filename}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON files in subdirectories with same filename overwrite each other

Medium Severity

For JSON files, the key is generated using only file_name() which extracts just the base filename, discarding any subdirectory path. If a user uploads a directory containing subdir1/data.json and subdir2/data.json, both files produce the identical key {org}/{snapshot_id}/data.json, causing the second file to silently overwrite the first and resulting in data loss.

Fix in Cursor Fix in Web

@szokeasaurusrex
Copy link
Member

Hey @lcian, are you ready for me to review this, or are you still planning to check the feedback from Bugbot and/or iterate further?

Copy link
Member

rbro112 commented Feb 10, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

rbro112 added a commit to getsentry/sentry that referenced this pull request Feb 11, 2026
Adds the initial snapshots POST API. This api does a few things and is
intended to be invoked by the CLI:
- Creates `PreprodArtifact`, `PreprodSnapshotMetrics` and
`CommitComparison` DB models
- Creates image metadata based on what's uploaded from CLI
- Stores metadata in objectstore

Notably images are uploaded directly to objectstore from CLI

Tested E2E locally with objectstore and WIP CLI branch
(getsentry/sentry-cli#3110)

Resolves EME-773
jaydgoss pushed a commit to getsentry/sentry that referenced this pull request Feb 12, 2026
Adds the initial snapshots POST API. This api does a few things and is
intended to be invoked by the CLI:
- Creates `PreprodArtifact`, `PreprodSnapshotMetrics` and
`CommitComparison` DB models
- Creates image metadata based on what's uploaded from CLI
- Stores metadata in objectstore

Notably images are uploaded directly to objectstore from CLI

Tested E2E locally with objectstore and WIP CLI branch
(getsentry/sentry-cli#3110)

Resolves EME-773
Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Most of them are minor and optional, but I would call your attention to the comment about memory usage and the potential for leaking tokens.

Let's discuss further next week

Comment on lines +1791 to +1795
#[derive(Deserialize, Debug)]
pub struct OrganizationLinks {
#[serde(rename = "regionUrl")]
pub region_url: String,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use rename_all to avoid duplicating the field name

Suggested change
#[derive(Deserialize, Debug)]
pub struct OrganizationLinks {
#[serde(rename = "regionUrl")]
pub region_url: String,
}
#[derive(Deserialize, Debug)]
#[serde(rename_all = "camelCase")]
pub struct OrganizationLinks {
pub region_url: String,
}

#[derive(Deserialize, Debug)]
pub struct OrganizationDetails {
pub id: String,
pub links: OrganizationLinks,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly I think this information might already be somehow embedded in org tokens, but not in personal tokens

This is 100% correct

Comment on lines +52 to +58
#[derive(Deserialize)]
struct CreateSnapshotResponse {
#[serde(rename = "artifactId")]
artifact_id: String,
#[serde(rename = "imageCount")]
image_count: u64,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l:

Suggested change
#[derive(Deserialize)]
struct CreateSnapshotResponse {
#[serde(rename = "artifactId")]
artifact_id: String,
#[serde(rename = "imageCount")]
image_count: u64,
}
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct CreateSnapshotResponse {
artifact_id: String,
image_count: u64,
}

Comment on lines +104 to +108
// Build manifest from discovered images
let mut manifest = json!({
"app_id": app_id,
"images": {},
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: Why is this being custom-constructed as a JSON object?

I think it might make more sense to make a dedicated struct type for this manifest. Is there any reason why this was not done?

Comment on lines +110 to +122
let images_obj = manifest["images"]
.as_object_mut()
.expect("images object was just created");
for image in &images {
images_obj.insert(
image.hash.clone(),
json!({
"file_name": image.relative_path,
"width": image.width,
"height": image.height,
}),
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: Following from my comment above, if we make a struct Manifest, this logic could be an implementation on the struct.

lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the TLS stuff next week, because I am uncertain what problems you're concerned about.

lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Before we merge these changes, we should ensure we are depending on a version that's been published to crates.io, not a Git branch.

};

let url = get_objectstore_url(Api::current(), org)?;
let client = ClientBuilder::new(url.clone())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: clone() appears to be unnecessary

Suggested change
let client = ClientBuilder::new(url.clone())
let client = ClientBuilder::new(url)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: I would move this to src/utils/api.rs for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l: Would move to src/utils/objectstore.rs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants